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Abstract 

Receiver operating characteristic (ROC) is usually used to analyse the performance of 
classifiers in data mining. An important ROC analysis topic is ROC convex hull(ROCCH), 
which is the least convex majorant (LCM) of the empirical ROC curve, and covers po- 
tential optima for the given set of classifiers. Generally, ROC performance maximiza- 
tion could be considered to maximize the ROCCH, which also means to maximize the 
true positive rate {tpr) and minimize the false positive rate (fpr) for each classifier in the 
ROC space. However, tpr wd fpr are conflicting with each other in the ROCCH opti- 
mization process. Though ROCCH maximization problem seems Uke a multi-objective 
optimization problem (MOP), the special characters make it different from traditional 
MOP. In this work, we will discuss the difference between them and propose convex 
hull-based multi-objective genetic programming (CH-MOGP) to solve ROCCH maxi- 
mization problems. Convex hull-based sort is an indicator based selection scheme that 
aims to maximize the area under convex hull, which serves as an unary indicator for the 
performance of a set of points. A selection procedure is described that can be efficiently 
implemented and follows similar design principles than classical hypervolume based 
optimization algorthms. It is hypothesized that by using a tailored indicator-based se- 
lection scheme CH-MOGP gets more efficient for ROC convex hull approximation 
than algorithms which compute all Pareto optimal points. To test our hypothesis we 
compare the new CH-MOGP to MOGP with classical selection schemes, including 
Non-dominated Sorting Genetic Algorithm-II (NSGA-II), Multi-objective Evolution- 
ary Algorithms Based on Decomposition (MOEA/D) and Multi-objective Selection 
Based on Dominated Hypervolume (SMS-EMOA). Experimental results based on 22 
well-known UCI data sets show that CH-MOGP outperforms significantly traditional 
EMOAs. 
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1. Introduction 

Traditionally, a classification task is to assign items (instances) in a data-set to 
target categories (classes) based on classifier(s) learnt by training instances. In binary 
classification there are only two classes or categories and all instances in the data set 
will be assigned one of them. The target of a classification problem is trying to design 
classifiers which make error- free assignments. 

The ROC graph is a technique for visualizing, organizing and selecting classifiers 
based on their performance |1|. A salient topic in ROC analysis is to generate ROC 
curves for varying discriminative thresholds over the output of the classifier 1 1 1, and 
ROC curves have been used widely in many areas. Actually, over the course of the past 
40 years, ROC technique has been widely applied in many research and application 
areas, such as signal detection ||2l, medical decision making |[3l, diagnostic systems Q. 

Though ROC curve works well in many cases, recently attention of the research 
is also drawn towards another perspective of ROC analysis, namely ROC convex hull 
(ROCCH). ROCCH pays more attention to the convex hull of a set of points (hard 
classifiers) obtained either from sever curves (i.e., soft classifiers) or itself (hard clas- 
sifier). A classifier is potentially optimal, if and only if it is a component of ROCCH, 
in other words, ROCCH could provide better choices than a single ROC curve to spe- 
cific environments. The significance of ROCCH in ROC analysis is that for test data 
sets with different skewed class distributions or misclassification costs, it is always 
possible to choose suitable classifiers by iso-performance line^ which is translated by 
operating conditions of classifiers and used to identify a portion of the ROCCH ISj. 
Consequently, ROCCH is emphasized in this paper and we will focus on searching a 
group of independently hard classifiers to maximize the ROCCH performance rather 
try to maximize the area under the ROC curve (AUC) of a single soft classifier 

Essentially, ROCCH is the collection of all potentially optimal classifiers in a given 
set of classifiers, so ROCCH maximization is to find a group of classifiers with their 
performance approximating the top and the left axes as near as possible in ROC space. 
However, ROCCH maximization is not an easy task, there are not many works focusing 
on how to maximize the ROCCH though it is a really important topic in classification 
problems. Generally, the exist works could be reviewed into two categories, ROC 
geometric analysis based machine learning methods and multi-objective optimization 
strategies based evolutionary computation methods for ROCCH maximization. 

Fawcett et al. f6l employed C4.5 and Rule Learning (RL) systems to induce de- 
cision rules in ROC space and its advanced version PRIE was introduced in Q. It 
was a straight way to analysis the geometrical properties to generate decision rules to 
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maximize the ROC performance. However, the procedure easily gets trapped in local 
optima. 

The concavity problem in ROC analysis was researched by Flach et al. [8| who 
demonstrated how to detect and repair concavities in ROC curves. The basic idea of 
that work is that if a point in the concavity can be mirrored to a better point which 
could perform well beyond the original ROC curve. But it is not a general method to 
maximize the ROC performance. 

ROCCER was introduced by Prati et al. in |9 |. It was argued that ROCCER is 
less dependent on the previously induced rules compared with set covering algorithms 
to construct rule sets that have a convex hull in ROC space. However, it adopted an 
association rule learner to generate new rules to cover the instance space as full as 
possible. It is too easy to fall into overfitting, because it needs many rules to cover the 
space which is similar with a decision tree with a very high height. 

The Neyman-Pearson lemma as the theoretical basis for finding the optimal combi- 
nation of classifiers to maximize the ROCCH is given in ifTOl . In contrast to the similar 
technique in (O, it not only focuses on repairing but it also pays attention on improv- 
ing if there was on concavity. For a given rule set, the method proposed by ifTOll can 
be efficient to combine these rules using AND and OR to get the optimum rule subset. 
However, as mentioned above, it misses schemes for generating new rules in the global 
rule set searching. 

To maximize ROCCH is searching a group of classifiers to maximize the ROCCH 
performance ideally would yield classifiers that simultaneously minimize the fpr and 
maximize the tpr, i. e. that are located as much to the left and to the top of the ROC 
space as possible. However, it is very hard to optimize fpr and tpr simultaneously 
because they are conflicting targets. From this perspective, ROCCH maximization 
problem is similar to multi-objective optimization problem. 

Zhao 1 11 1 proposed specific non-dominated relationship involved into multi-objective 
optimization framework to optimize tpr and l~fpr. However, it paid more attention on 
cost-sensitive classification and made many rules by information of costs of misclassifi- 
cation to rank the individuals in its multi -objective genetic programming. First, it is not 
a general method for ROCCH maximization because it only focused on cost-sensitive 
problem. Second, two data sets involved in experiments are too few to evaluation the 
proposed method. 

Bhowan et al. searched the Pareto front to maximize the accuracy of each minority 
class with unbalanced data set [12J, and they also employed multi-objective optimiza- 
tion techniques to evolve diverse ensembles using genetic programming to maximize 
the classification performance in 113]. 

Wang et al. investigated investigated some EMOAs such as NSGA-II OH, MOEA/D ifTSl . 
SMS-EMOA ifTSI and Approximation-Guided Evolutionary Multi-objective Algorithm 
(AG-EMOA) |TT|. These different evolutionary multi -objective optimization frame- 
works had been combined with genetic programming to maximize ROC performance IJ8l . 

However, ROCCH is different from Pareto front though it was reported they were 
similar to each other [T9^1. ROCCH is the collection of points which construct the 
convex hull of existing classifiers in ROC space, and Pareto front is the collection of 
points that is the first level sorted by dominance relationship. Though evolutionary 
multi-objective algorithms(EMOAs) have been successfully used into ROCCH max- 
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imization, these EMO techniques do not take into account a special characteristic of 
ROCCH. That is by mixing two classifiers we can take any two real classifiers to con- 
struct any virtual classifier with its performance as a point along the line connected by 
above two points |19|. Consequently, hard classifiers in concave parts of the Pareto 
front can always be replaced by classifier combinations that yield dominating points. 
The computational resources for the approximation of concave parts are thus better 
spent on the accurate approximation of only those parts of the Pareto front that are part 
of the convex hull. 

In II20I l2n I22II . convex hull concept of was employed into EMOAs to make the 
sort fast or maintain a well-distributed set of non-dominated solutions. These work 
are good to supply some ideas of convex hull based sort. In |23| and f24\, convex 
hull-based ranking involved with evolutionary multi-objective optimization and fuzzy 
rule-based binary classifiers to maximize ROOCH in ROC space. However, the number 
of levels was pre-defined as three without explaining in first work and the second one 
was considered as bi-objectives optimization, which were accuracy of classification 
and complexity of classifier rules. 

Moreover, instead of designing algorithms based on Pareto dominance compliant 
performance indicators, such as the hypervolume indicator as done in lfT6ll and in ll25l . 
it seems more promising to directly target the algorithm towards the maximization of 
the area under the convex hull (AUCH). 

In this paper, we utilize Genetic Programming (GP) combined multi-objective tech- 
niques to get the optimal ROCHH. Two strategies will be represented, the first is the 
convex hull-based without redundancy sort to make the population of GP into several 
levels such as non-dominated sort in NSGA-II, the second is using area-based contri- 
bution to select the survivors in the same level, actually we use /i H- /i selection strategy 
as [25]. We show that convex hull-based without redundancy sort plays a key role in 
multi-objective genetic programming (MOGP) for maximizing ROCCH performance 
and area-based contribution selection scheme also can improve the performance. 

This paper is organized as follows: Section[2]will discuss the relationship between 
ROCCH optimization and traditional multi-objective optimization in detail. Convex 
hull-based multi -objective genetic programming (CH-MOGP) will be described in Sec- 
tion |3] Experiments are studied in Section |4] and shows the advantages of our new al- 
gorithm. Section|5]gives the conclusions and a discussion on the important aspects and 
the future perspectives of this work. 

2. ROC Convex Hull and Multi-objective Optimization 

2.1. What is ROCCH? 

Basically, ROC analysis concerns the confusion matrix for the outputs of a clas- 
sifier, in which we can analysis the performance by measuring different metrics such 
as accuracy, precise, specificity, sensitivity and some others. ROC graph (Left side of 
Fig.[T]) is plotted upon Y axis and X axis respectively taken tpr and fpr, which are also 
defined from the confusion matrix. Each classifier can be mapped in the ROC graph 
by its performance. Essentially, ROCCH is the collection of all potentially optimal 
classifiers in a given set of classifiers(Right side of Fig. [T}. Furthermore, a classifier is 
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potentially optimal if and only if it lies on the convex hull of the set of points in ROC 
space yj. 




Figure 1 : ROC graph and ROC Convex hull in ROC Space 

2.2. ROCCH maximization problem and multi-objective optimization problem 

The target of ROOCH maximization problem essentially aims at searching a group 
of solutions (classifiers)to approximate the upmost line and the leftmost line in ROC 
space as closely as possible. However, it is conflicting to minimize fpr and maximize 
tpr simultaneously because if the classifier labels more instances as positives, it will 
produce less negatives and vice versa. Generally speaking, ROCCH maximization is 
considered as a multi-objective optimization problem from this perspective and we can 
describe it as follows: 



maximize {ftpr{x), fi-fpr{x)) 

subject to X E n (1) 

In Eq. [l] a; is a classifier and F{x) is a vector function for fpr and tpr of the 
classifier An important term in MOP is dominance which can be defined as: Let 
u = (ui , . . . , Mm), V = {vi , . . . , f m) be two vectors, u is said to dominate vxfui < Vi 
for all i = 1. . .TO, and v, this is noted ■d&u <v.\iu and v can not dominate each 
other, we say that u and v are nondominated. The nondominated set is a set that each 
item does not dominate any another one. A point x* is called Pareto optimal if there is 
no a; e such that F{x) dominates F{x*) lfT5ll26J . Pareto set (PS) is the collection of 
all Pareto optimal points. The Pareto front is the set of all the Pareto objective vectors 
PF ^ {F{x)\x e PS}. 

Most evolutionary multi-objective algorithms involves the pair-wise based domi- 
nance to describe the relationship of two solutions. However, we get a special character 
in ROCCH maximization in ROC space. Fig.|2]shows the convex hull part and Pareto 
front for all points. Obviously, convex hull is different from the Pareto front though 
they were argued that they are similar to each other ETll . For example, points a, b, c in 
Fig.|2]are non-dominated set in traditional multi-objective optimization problem, how- 
ever, the classifier along the line connected by a and c would dominate b. That is the 
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special character in ROC maximization problem which makes ROCCH maximization 
is beyond traditional multi-objective optimization. However, we need to design some 
new techniques for searching a group of classifiers with maximum ROCCH. 




Figure 2: Pareto front and convex hull 



2.3. Nondominated sort does harm to EMOAs in ROCCH maximization 

The root reason for why we want to get the convex hull rather than Pareto front is 
that two classifiers will produce any classifiers with their ROC performance which is 
along the line connected by two point representing for the performance for previous 
two classifiers in ROC space [\9\. As shown in left side of Fig. [3] classifiers with 
performance at point d and h can be used to construct any virtual classifier with its 
performance at e along the line connected by d and h. That is a special and important 
character in ROCCH maximization problem. 




Figure 3: Nondominated sort keeps the individual which does nothing contribution to ROCCH 

In the right side of Fig. [3] all the points are nondominated to each other and belong 
to the convex hull expect for point a. However, if we take crowding-distance selection 
or hyper-volume contribution based selection to select one individual to be discarded 
from the population, obviously, point a will be selected to survive rather than point h 
though point a is not on the convex hull. Actually, there are two phenomenons we need 
pay attention to, one is the sort strategy and the other is the selection scheme. Besides, 
suitable sort strategy and selection scheme are should be considered in EMOAs for 
ROCCH not matter which classifier is involved. 

2.4. The motivation and ideas for new multi-objective algorithms for ROCCH maxi- 
mization 

We need to think about how to use the special character of ROCCH to make multi- 
objective optimization techniques more efficient to solve the ROCCH maximization 
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problem. The main techniques for MOP is how to rank the population to select the 
solutions to survive in next generation. The mostly common rank approach includes 
two steps, one is sorting the population into several levels indicating the priority level, 
after that, a selection scheme is used to choose winners from solutions at the same level. 
In ROCCH maximization problem, firstly, convex hull-based idea will considered into 
sorting strategy, however, because of the critical concept of convex hull, it would make 
the diversity decrease fast in the evolutionary process, so we design convex hull-based 
sorting without redundancy to sort the population. Another idea is to use area-base 
selection scheme because the target is to maximize the area under the convex hull 
insteading of hypervolume or crowding-distance. Convex hull-based sorting without 
redundancy and area-based selection scheme will be descried in detail in Section|3] 





Figure 4: Convex hull-based sorting with and without redundancy. Area-based contribution to ROCCH 



3. Convex Hull-based Multi-objective Genetic Programming (CH-MOGP) 

In this section, we will describe our proposed convex hull-based multi-objective 
genetic programming to maximize ROCCH. Firstly, convex hull-based sorting with- 
out redundancy approach is used to rank the individuals in the union population into 
several levels which represent different priorities to survive as described in NSGA-II. 
Secondly, as the target is to maximize the area under the convex hull (AUCH) rather 
than the hypervolume mentioned in SMS-EMOA, and area-based indicator is designed 
to calculate the contribution of each individual to AUCH maximization. One major 
of disadvantage of (/i -i- 1) selection strategy was employed in SMS-EMOA and AG- 
EMOA is that it needs to call fast-nondominated sorting /i times to select /i offsprings. 
In 1251 , an approximate scheme (/i -i- /i) is proposed to make the selection process 
faster, and this idea has been adopted in CH-MOGR 

3.1. Convex hull-based without redundancy sorting 

First of all, we introduce convex hull-based without redundancy sorting in this sub- 
section. The main idea is too keep the diversity of the population by force, that means, 
each redundant solutions will be put into an archive to be random selected to survive 
into the next generation if there is not enough non-redundant solutions to fill the whole 
population full. Non-redundant solutions with not good performance have chance to be 
kept by discarding the redundant solutions with good performance to make high diver- 
sity, and this could avoid that the solutions at the convex hull being copied a lot at the 
selection phase in evolutionary multi-objective optimization. As described in Alg. [3] 
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Algorithm 1 Convex hull-based-sorting-without-redundancy {Q,r) 
Require: Q 7^ 

1: Q is a solution set 

2: r is the reference point 
Ensure: ch-based-sorting-without-redundancy 

3: i = 

4: while Q / do 

5: T = Q\j{r} 

6: F, = Jarvis-Algorithm(r) (HI 

7: Fi = Elimination(Fi) // Some points in F; are not interesting and removed 
8: Q = Q - F, 
9: i = i + 1 
10: end while 



the population will be split into redundant part and the other part which is sorted by 
convex hull-based sorting into several levels and the redundant part is taken as the last 
level which is the candidates by random selecting. 

In Fig. |4] the first and second graphs gives the illustration for convex hull-based 
sorting with and without redundancy. All the redundant individuals will be discarded 
into the last level and selected random to the next generation if it is necessary. 



Algorithm 2 DeltaArea (Q) 

Require: Q 7^ 

1 : Q is a solution set 
Ensure: DeltaArea 

2: m = sizeof{Q) 

3: E is performance of Q 

4: DeltaHi, ...,DeltaH„ -h- 

5: if m < 3 then 

6: Set DeltaHi , . . . , DeltaHi -h- 00 

7: else 

8: Set DeltaHi , DeltaHi cxj 

9: for 2 < i < sizeofiQ) - 1 do 
10: DeltaH, = 0.5 x det((E,-E,_i) o (E,+i-E,_i)) 
11: end for 

12: while sizeof{Q) > 2 do 

13: r -f- ar5rmm{DeltaH} 

14: Q^Q\{Qr} 

15: Update(DeltaHr-i,DeltaHr+i) 

16: end while 

17: end if 

18: Return (DeltaH) 
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Algorithm 3 Reduce (Q,N) 



Require: (5^0 

1: Q is a solution set 

2: N is the number of solutions will be discarded 
Ensure: Reduce 

3: -F = empty 

4: Split Q into two subpopulation U and Rll R is the collection of redundant individuals 

5: \isizeof{R) >= N then 

6: F Random select N solutions from R 

1: Q^UU R\F 

8: else 

9: R 

10: SRi , . . . , 3ff„ ^Convexhull-hased-sort-without-redundancy{Q) 
11: fori = v...ldo 

12: if sizeof{F) + sizeofCSti) < N then 

13: F ^ FU^^ 

14: (7 = f/\SRi 

15: else 

16: break 

17: end if 

18: end for 

19: T 4- Select {N — sizeof{F)) solutions with minial DeltaArea{Ri) 

20: F F U T 

21: U^U\T 

22: Q ^ U 

23: end if 

24: Return (Q) 
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3.2. Area-based Selection Scheme 



det X-L o U-X 

Aarea = — ^ (2) 

2 

In this subsection, we describe our area-based indicator for selection scheme in the 
new EMOA. The reason for why area-based and not hypervolume-based contribution 
is adopt is we need to maximize the area under the convex hull. Area-based indicator 
is more directly and efficiently. In the third graph of Fig. [4] it shows the novel area 
calculation for two dimensions. The contribution of one point x with its performance 
vector X to the area is the area of triangle constructed by the point with its predecessor 
I and successor u with performance vector L and U. Alg. |2] gives the procedure of 
calculating of the novel area contribution. Eq.|2]gives the equation to how to calculate 
the area contribution of each point to its convex hull front. 



Algorithm 4 CH-MOGP (Max, N) 
Require: Max > 0,N > 

1: Max is the maximum of evaluations 
2: N is the population size 
Ensure: CH-MOGP 

3: Po = initQ 
4: t = 
5: m = 

6: while m < Max do 

7: Qt = empty 

8: for i = 1 : iV do 

9: Operators on Pt 

10: Qt^Qt + qi 

11: end for 

12: Pt+i Reduce{Pt U Qt) 
13: 

14: m m + TV 
15: end while 



3.3. CH-MOGP 

Alg. |4] describes the CH-MOGP algorithm. The framework is very similar with 
SMS-EMOA and NSGA-II. However, we employ convex hull-based sorting without 
redundancy approach to rank the individuals into different levels, (/i + /i) scheme is 
adopted into CH-MOGP to maximize the ROC performance. Because the target is to 
maximize area under the convex hull, area-based selection is designed insteading of 
hypervolume-based contribution to keep the survivors with high area-based contribu- 
tion. 

In Alg.|4] first of all, the population size and the maximum of evaluations are given. 
Initial population is constructed by a group of solutions represented by genetic decision 
trees [291 using ramped-half-and-half method ll30l . To generate the offsprings, two op- 
erators are employed and described in detail in lISTl . The selection part of CH-MOGP 
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are operated by two schemes like other EMOAs, one is how to sort the population into 
different levels and the other is how to rank the solutions at the same level. Convex 
hull-based without redundancy sorting and area-based selection scheme play the main 
role to the selection part of CH-MOGP. To reduce the time of calling sorting approach, 
we also take + scheme not {p. + 1) in SMS-EMOA. 

4. Experimental Studies 

4.1. Data Set 

Nineteen data sets are selected from the UCI repository ll32l and described in Ta- 
ble|2] Actually, we choose another three large-scaled data sets described in the last row 
of Table [2]to make more solid results. In this paper, we focus on binary classification 
problems, so all the data sets are two-class problems. Balanced and imbalanced bench- 
mark data sets are carefully selected. The scale in terms of the number of instances of 
these data sets ranges from hundreds to thousands. 

Table 1 : Algorithms Involved 



Name Soiting Selection Selieme 



CH'MOGP CH-No-Redundaney Area ;i + fi 

RCHH'EMOA CH-No-Redimdaney Area fi + 1 

CH-EMOA Convex Hull Hypervolume fi + 1 

CHCrowiling CH-No-Redundaney Crowding-distanee M + M 

CHH-MOGP Convex Hull Area ;i + 1 

NSGA-Il Non-dominated Crowding-distanee + 

SMS'EMOA Non-dominated Hypervolume p + 1 

MOEA/D Fitness Fitness 



4.2. Algorithms Involved 

To evoluate the performance of two strategies proposed in this paper. Table. [T] de- 
scribes the algorithms involved to make rigorous and sufficient experimental compar- 
isons. Generally speaking, this experiment is designed by considering three section 
of the EMOA, the first one is the strategy in sorting part including convex hull-based 
with and without redundancy sorting and non-dominated sorting(however, MOEA/D 
is decomposition based MOEA with different framework), the second one is the indi- 
cator for selection schemes including area-based, hyperovlume-based and crowding- 
distance-based, the last one is related with (/x + fi) and {p + 1) for different EMOAs. 



Table 2: Nineteen UCI Data Sets 



Data Set 


No. of 
features 


Distribution 


Data Set 


No.of 
features 


Class 

Distribution 


Data Set 


No.of 
features 


Class 

Distribution 


australian 


14 


383:307 




16 


168:267 




8 


268:500 




9 


458:241 


ionosphere 


34 


225:126 




60 


97:111 




15 


307:383 


kr-vs-kp 


36 


1669:1527 


monks-3 


6 


228:204 




4 


178:570 


mammographic 


5 


445:516 




22 


212:55 


german 


24 


700:300 


monks- 1 


6 


216:216 




22 


147:48 


wdbc 


30 


212:357 




6 


290:142 




9 


626:332 


bands 


36 


228:312 
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Table 3: Evaluation Times for each algorithm on 19 UCI Data Sets 



Data Sel ^ . 

Evaluatio 

mismilian 100000 

german 200000 

mammographic 60000 

Parkinsons 30000 

lic-lac-loe 300000 

magk04 10000 



Data Set 

monks- 1 
transfusion 



No. of 
Evaluations 

150000 
30000 
200000 
80000 
22000 
10000 



No.of 
Evaluations 

50000 
80000 
1000000 
sonar 30000 
wdbc 30000 



,sphere 
mnks-2 



monks- 3 



No.of 
Evaluations 

50000 
200000 
40000 



spect 40000 
adult 10000 



Table 4: Parameters for 8 algorithms 





Objective 


Maximize Convex hull in ROC 




Terminals of GP 


{0,1} with 1 representing "Positive"; 
representing "Negative" 


Function set of GP 


If-then-else , and. 
or, not, >,<,-. 


Data sets 


22 UCI data sets 


Algorithms 


8 algorithms in Tablej 1 1 


Crossover rate 


0.9 


Mutation rate 


0.1 


Shifting rate 


0.1 


Splitting rate 


0.1 


Parameters for GP 


P(Population size) ^ 20; 

G (Maximum Evaluation Times) - M 

Number of Runs : 

5 fold cross-validation 20 times 


Termination criterion 


Maximum of C of 

evaluation time has been reached 


Selection strategy 


Tournament selection, Size - 4 


Max depth of 


3/17 



4.3. Evaluation and Configuration 

Evaluation: To evaluate the generalization performances of different classifiers 
produced by different algorithms, cross-validation is employed. We apply each algo- 
rithm on each 22 data sets with five-folds cross-validation for 20 times. Because we 
want to emphasize that our CH-MOGP could be better with less evaluation times, so 
we run each compared algorithms with large enough evaluation times to make them 
converge. Table. |3]gives the details for algorithms on each data set. 

Configuration: We take the representation called GDT |29 1 as the individual in all 
multi-objective evolutionary algorithms. For binary classification problems, and 1 
(standing for negative and positive) are selected as the terminals of GP. Every classifier 
(individual) is constructed as if-then-else tree which involves and, or, not, >,< 
and = as operator symbols. Most offspring individuals are obtained by the crossover 
operator with probability 0.9. We also employed the shifting, and splitting operators 
described in 1 33] with probability 0. 1 . Tournament selection is adopted as the selection 
strategy and the tournament size is set to 4. To avoid overfitting, the maximum depth 
of each individual tree is limited to 17. 



4.4. Results and Analysis 

Fig.|7j Fig.|6] Table. |5]and Table. 1 1 show the performance of CH-EMOA compared 



with other EMOAs in 22 data sets. Generally speaking, CH-EMOA outperforms better 
not only at the AUCH but also the cost time. 

In this subsection, we want to answer the questions as follows: 

1 . Why convex hull-based sorting without redundancy is better than traditional con- 
vex hull-based sorting? 
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Figure 5: The diversity in convex liull-based sorting with and without redundancy effects the performance of 
the results 



2. Is convex hull-based sorting without redundancy is better than non-dominated 
sorting approach in ROCCH maximization problem? 

3. Is area-based selection scheme is comparable with or better than crowding-distance 
or hypervolume based selection? 

4. Is CH-MOGP is better than NSGA-II, SMS-EMOA and MOEA/D for ROCCH 
maximization? 

5. Does CH-MOGP show some advantages to traditional machine learning algo- 
rithms? 

To evaluate the ideas we have proposed, we use 19 data sets in Table. |2]with algo- 
rithms described in Table. [T] 



Table 5: Performance of four different frameworks of MOGP on UCI data sets, mean and standard deviation, 
multiplied by 100, are given in this table 



CH-MOGP SMS-EMOA NSGA-lI 



MOEA/D 



CH-MOGP SMS-EMOA 



austmlian 
bcw 
gennan 
ionosphere 
mammographie 
monks-2 
Parkinsons 
sonar 
tic-lac-loe 
wdbc 



91.49 ±2.72 
97.94 ± 1.20 
73.10 ±3.24 
91.07 ±4.95 
89.75 ±2.01 
91.05 ±8.00 
86.79 ± 6.86 
79.42 ± 5.87 
83.40 ± 10.4 
9678 ± 1.92 



91.67 ±2.48 
97.73 ± 1.56 
73.32 ±3.33 
90.51 ±4.52 

89.48 ± 1.94 
89.28 ±5.58 
85.11 ±6.68 
78.04 ± 5.91 
79.56± 11.1 

96.49 ± 2.25 



91.16 ±2.41 
97.84 ± 1.41 
72.39 ± 3.07 
90.45 ± 4.53 
89.41 ± 1.87 
9053 ±5.19 
84.90 ± 7.54 
77.79 ± 7.34 
79.07 ± 13.4 
96.70 ±2. 11 



90.29 ± 2.75 
97.48 ± 1.48 
71.45 ±2.85 

89.89 ± 4.83 
87.50 ± 2.23 
73.26 ±9.14 
83.94 ± 6.72 
75.75 ± 5.66 
70.85 ± 10.4 

95.90 ± 2.19 



kr-vs-kp 
monks-1 
monks-3 
pima 
spect 
''(insfiisitm 



77.00 ± 4.05 
91.30 ±2.45 
97.94 ± 1.56 
98.40 ±0.89 
99.70 ± 1.68 
99.81 ±0.43 
80.08 ± 3.38 
77.38 ± 7.36 
71.62 ±4.62 



76.38 ±4.09 
91.16 ±2.33 
97.69 ± 1.59 
98.63 ± 0.75 
97.62 ±3.71 
99.74 ± 0.45 
79.85 ±3.38 
76.27 ±7.14 
71.48 ±4.47 



75.54 ±3.56 
91.14 ±236 
97.74 ± 1.71 



71.85 ± 3.82 
89.88 ±2.51 
97.15 ± 1.75 



98.39 ±0.79 96.67 ±1.43 
99.62 ± 1.35 96.51 ±5.69 



99.45 ± 2.87 
79.29 ± 3.70 
76.91 ± 8.46 
71.49 ±484 



99.07 ± 0.88 
76.93 ± 3.10 
74.88 ± 6.43 
68.77 ± 4.63 



Table 6: Performance of four different frameworks of MOGP on three big data sets, mean and standard 
deviation, multiplied by 100, are given in this table 





CH-MOGP 


SMS-EMOA 


NSGA-II 


MOEA/D 




CH-MOGP 


SMS-EMOA 


NSGA-II 


MOEA/D 


adnh 
skin 


84.58 ± 1.40 
97.10± 1.11 


82.53 ±2.15 
95.46 ± 1.85 


84.01 ± 1.38 
96.57 ± 1.25 


77.04 ± 2.54 
93.20 ± 2.37 


incigic04 


83.02 ± 1.04 


81.76 ± 1.57 


82.01 ± 1.19 


76.39 ± 3.07 
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4.4.7. Question 1 

As we argued above, because of the greedy sort of convex hull-based sorting, the 
diversity will decrease fast as the generation or evaluation times. Fig. [5] shows the 
performance of CHH-EMOA and RCHH-EMOA which has been described in Table.[T] 
The only different between these two algorithms is the sorting scheme, CHH-EMOA 
adopts traditional convex hull-based sorting and RCHH-EMOA employs the convex 
hull-based sorting without redundancy approach. The third and fourth graph in Fig. |5] 
give the number of different individuals in the convex hull and in the whole population 
which are simply indicated as the measurement for the diversity. Obviously, RCHH- 
EMOA with larger diversity performs better than CHH-EMOA is the first and second 
graph in Fig. |5] which describe the AUCH performance in traning and test data set 
(Here, we take data set "Sonar" as an example). However, we also give the Wilcoxon- 
Sum-Rank-Test results (Which is with a condence level of 0.95) of RCHH-EMOA and 
CHH-EMOA for 19 data sets in Table. |7] Generally speaking, RCHH-EMOA with 
convex hull-based sorting without redundancy is better than CHH-EMOA. 

Table 7: Wilcoxon SUM-RANK Test on 19 UCI Data Sets: The table shows the wilcoxon sum test results 
between RCHH-EMOA and CHH-EMOA on 19 UCI Data sets at different evaluation times. Each x-y-z in 
following table means RCH-EMOA wins x times, losses z times and draws y times. Ratio means the ratio 
of total evaluation times 

Ratio iTi i i 5 I 1 

CHH-EMOA 4-15-0 5-14-0 5-14-0 6-13-0 6-13-0 6-13-0 4-15-0 



4.4.2. Question 2 

Algorithm CHCrowding and NSGA-II are involved into answering question 2. As 
described in Table. [T| CHCrowding and NSGA-11 employ crowding-distance as the 
strategy into selection scheme, however, they adopt different sorting approach. Convex 
hull-based sorting without redundancy is employed into CHCrowding and NSGA-II 
takes fast nondominated sorting, which is the only difference between them. Table. |8] 
shows the Wilcoxon-Sum-Rank-Test results (Which is with a condence level of 0.95) 
for them. Obviously, CHCrowding losses none to NSGA-II and wins sometimes. 

Table 8: Wilcoxon SUM-RANK Test on 19 UCI Data Sets: The table shows the wilcoxon sum test results 
between CHCrowding and NSGA-II on 19 UCI Data sets at different evaluation times. Each x-y-z in 
following table means CHCrowding wins x times, losses z times and draws y times. Ratio means the ratio 
of total evaluation times 

A m i 3 I i 1 ~ 

NSGA-II -3-16-0 2-17-0 2-17-0 3-16-0 3-16-0 3-16-0 1-18-0 



4.4.3. Question 3 

For question 3, we takes two comparisons to explain. The first is that CHH-EMOA 
and CH-EMOA which are the same except for the selection schemes. In other words. 
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CHH-EMOA prefers area-based selection and hypervolume contribution is involved 
into selection scheme for CH-EMOA. We also gives the Wilcoxon-Sum-Rank-Test re- 
sults (Which is with a condence level of 0.95) for them in Table. |9] Obviously, area- 
based selection works better than hypervolume contribution when they are combined 
with convex hull-based sorting approach into multi-objective optimization algorithm 
designs. On the other hand, we employ CHCrowding and CH-MOGP to measure the 
different performance of area-based selection and crowding-distance selection. How- 
ever, Table. [To] shows there is no difference between them in 19 data sets. One reason 
is that convex hull-based sorting without redundancy plays more important role in the 
multi-objective algorithms than the selection scheme, however, selection scheme is 
also needed for the EMOAs. Though area-based and crowding-distance based selec- 
tion schemes show no difference in above two algorithms, we still choose area-based 
selection because it is more intuitive for maximizing the ROC performance. 



Table 9: Wilcoxon SUM-RANK Test on 19 UCI Data Sets: The table shows the wilcoxon sum test results 
between CHH-EMOA and CH-EMOA on 19 UCI Data sets at different evaluation times. Each x-y-z in 
following table means CHH-EMOA wins x times, losses z times and draws y times. Ratio means the ratio 
of total evaluation times 



Raliu of lolLil evaluations it; TTi T 1 5 ^ ^ 

CH-EMOA 3-16.0 4-15-0 4-15-0 4-15-0 4-15-0 6-13-0 5-14-0 



Table 10: Wilcoxon SUM Test on 19 UCI Data Sets: The table shows the wilcoxon sum test results between 
CH-MOGP and CHCrowding on 19 UCI Data sets at different evaluation times. Each x-y-z in following 
table means CH-MOGP wins x times, losses z times and draws y times. Ratio means the ratio of total 
evaluation times 



Ratio Tii ITj 3 ^ ^ i ^ 

CHCrowding 0-19-0 0-19-0 0-19-0 0-19-0 0-19-0 0-19-0 0-19-0 



4.4.4. Question 4 

AUCH analysis: To answer the question 4, we employ more data set, specially 
for big data set because we always emphasize that our algorithm will perform better 
with less evaluation times which means we will save a lot of time for problems with 
expensive evaluation. Table. 12 describes three big data set. Table. |5] and |6] give the 
result of 4 different evolutionary multi-objective algorithms involved with GDT for 
maximizing the area under convex hull in ROC space. Furthermore, Table. 1 1 gives the 
Wilcoxon Sum-Rank Test results (Which is with a condence level of 0.95) for them. 
To compare the performance of all algorithms at each stage of its evolutionary process, 
we show the results at 1/15, 1/10, 1/4, 1/3, 1/2 and 1 of the whole process. It is very 
clear that CH-MOGP outperforms among these EMOAs. 

The Performance and Evaluation Times: Fig.|7]and Fig.|6]show the performance 
of CH-MOGP, SMS-EMOA, NSGA-II and MOEA/D on 22 data sets. Actually, we 
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Table 11: Wilcoxon SUM-Rank Test on 22 UCI Data Sets: The table shows the wilcoxon sum test results 
between CH-EMOA and other three EMOAs (NSGA-H, SMS-EMOA and MOEA/D) on 22 UCI Data sets 
at different evaluation times. Each x-y-z in following table means CH-EMOA wins x times, losses z times 
and draws y times.Ratio means the ratio of total evaluation times 



Ratio 


1 
IB 


10 


4 


1 

3 


2 


2 
3 


1 


NSGA-II 


4-15-0 


4-15-0 


2-17-0 


4-15-0 


5-14-0 


5-14-0 


4-15-0 


SMS-EMOA 


11-8-0 


11-8-0 


6-13-0 


5-14-0 


4-15-0 


4-15-0 


5-14-0 


MOEA/D 


19-0-0 


19-0-0 


19-0-0 


19-0-0 


19-0-0 


19-0-0 


19-0-0 


NSGA-II 


0-3-0 


1-2-0 


1-2-0 


1-2-0 


2-1-0 


2-1-0 


2-1-0 


SMS-EMOA 


3-0-0 


3-0-0 


3-0-0 


3-0-0 


3-0-0 


3-0-0 


3-0-0 


MOEA/D 


3-0-0 


3-0-0 


3-0-0 


3-0-0 


3-0-0 


3-0-0 


3-0-0 



Table 12: Three Large-scaled Data Sets 



Data Set Class No,of Class No.of Class 

features Distribution features Distribution features Distribution 

skin 4 50859:194198 imgic04 10 12332:6688 adult 14 11687:3715; 



Table 13: Evaluation Times for each algorithm on 22 UCI Data Sets 





No. of 




No.of 




No.of 




No.of 


Data Set 


Data Set 


Data Set 


Data Set 


Evaluations 


Evaluations 


Evaluations 


Evaluations 


austmiian 


100000 


bands 


1500000 


bcw 


18500 


crx 


450000 


german 


120000 




24000 


ionosphere 


80000 


kr-vs-kp 


2000000 


tmmmographic 


80000 


monks-l 


230000 


monks-2 


10000000 


monks-3 


190000 


Parkinsons 


42000 




180000 




12000 


sped 


10000 


tic-tac-toe 


3000000 


transfiision 


35000 


wdbc 


21000 


adult 


300000 


tmgic04 


40000 


skin 


30000 











Table 14: Performance of CH-MOGP and traditional classifiers on UCI data sets, mean and standard devia- 
tion, multiplied by 100, are given in this table 



australian 91.97 ± 2.53 

bands 78.50 ± 3.56 

bcw 98.17 ±1.06 

crx 9l.S2±2.n 

german 14.27 ± 2.79 

house-votes 98.23 ± 1.26 

ionosphere 92.42 ± 3.66 

kr-vs-kp 99.40 ± 0.26 

ammographic 90.20 ± 1.76 

monks-l 100.0 ± 0.00 

monks-2 95.68 ± 4.61 



C4.5 

85.52 ± 4.05 
74.56 ± 4.59 
95.05 ± 2.55 
85.51 ± 3.94 
65.36 ± 4.74 
96.35 ± 2.04 
88.20 ± 5.65 
99.71 ± 0.23 
87.66 ± 2.21 
77.13 ± 6.90 
94.17 ± 5.93 



NB 

89.47 ± 2.78 

73.91 ±4.68 

98.92 ± 0.62 
87.88 ±3.16 
78.42 ± 2.94 
98.05 ± 1.04 
93.57 ±3.18 
93,21 ± 1.00 
89.77 ± 1.96 
73.18 ±4.58 
52.38 ± 7.04 



Pyriel 

91.75 ± 2.36 
76.07 ±4.81 
98.16 ± 1.09 
90.65 ± 2.77 
75.95 ± 3.25 
97.80 ± 1.49 
93.68 ± 4.23 
98.26 ± 0.44 
89.70 ± 2.02 
70.93 ± 5.59 
51.25 ± 6.16 



moni.!-J 100.0 ±0.00 

parkin.wits 86.10 ±6.66 

pima 80.74 ±3.12 

sonar 81.44 ±5.15 

specl 78.56 ± 7.44 

tic-tac-toe 90.07 ± 8.88 

transfiision 72.19 ±4.89 

wdbc 97.32 ± 1.40 

adult 88.97 ± 0.37 

magic04 87.16 ±0.74 

skin 99.49 ±0.11 



C4.S 

100.0 ± 0.00 
78.91 ± 9.76 
75.23 ± 4.93 
73.85 ±7.84 

76.88 ± 8.91 
84.91 ± 13.9 
71.08 ± 5.08 
92.74 ± 3.16 

88.89 ± 0.53 
86.76 ± 0.83 
99.93 ± 0.02 



NB 

95.94 ±2.17 
85.91 ±6.11 
81.40 ± 3.01 
80.12 ±7.03 
84.09 ± 6.03 
61.50 ± 14.7 
70.93 ± 4.94 
98.14 ± 1.33 
85.27 ± 0.37 
75.70 ± 0.74 
94.17 ± 0.07 



Pyriel 

99.60 ± 0.27 
88.24 ±5.83 
79.58 ±2.92 
69.92 ± 8.64 
83.51 ±7.01 
70.41 ± 12.5 
70.87 ± 5.39 
96.58 ± 1.94 
90.37 ± 0.25 
85.37 ± 0.76 
98.15 ±0.08 



Table 15: Wilcoxon SUM Test on 22 UCI Data Sets: The table shows the wilcoxon SUM-RANK test results 
between CH-EMOA and other three machine learning algorithms on 22 UCI Data sets at different evaluation 
times. Each x-y-z in following table means CH-EMOA wins x times, losses z times and draws y times. 



CH-MOGP 


C4.5 


NB 


PRIE 


CH-MOGP 


15-5-2 


11-6-5 


13-4-5 


C4.S 




8-2-12 


8-1-13 


NB 






6-6-10 


Pyriel 









16 



give the convergence of these EMOAs for training and test data sets with 5-fold cross- 
validation 20 times. Generally speaking, the curves of CH-MOGP are over others on 
most data sets. In other words, for a given and very limited evaluation times, CH- 
MOGP can perform better than other EMOAs in the classification task. 



4.4.5. Question 5 

AUCH comparison: In this sub-section, we compare CH-MOGP with C4.5 ll34l . 
Naive Bayes(NB) |35 1 and PRIE |7| which are traditional machine learning algorithms 
for constructing classifiers. To make a fair comparison, we set the population size of 
CH-MOGP as 100. The reason is that soft classifiers usually output scores/probabilities 
to its test data sets, and the number of different kinds of scores or probabilities decides 
the number of performance points in ROC space, however, that number is not a small 
one. So we choose a general number, 100, as the population size of CH-MOGP. Mean- 
while, it needs more evaluations to a larger population size, so Table. [13] gives the 
evaluation times for CH-MOGP in 22 data sets. Fig. [14] shows the results for CH- 
MOGP, C4.5, NB and PRIE in all data sets, furthermore, Wilcoxon Sum-Rank Test 



results (Which is with a condence level of 0.95) are given in Fig. 15 
Evaluation Times: 



Table 16: Times for CH-MOGP, C4.5, NB and PRIE to construct classifiers to maximize ROCHH 



Time(s| 


CHMOGP 


C4.5 


NB 


PRIE 


T[me(s) 


CHMOGP 


C4,5 


NB 


PRIE 


ausCraliaii 


116.91 


0.06 


0.02 


4.18 




2242.5 


0,04 


0.03 


15,85 




28.63 


0.01 


0.02 


0,53 




653.45 


0,02 


0.02 


2,92 


german 


234,27 


0,16 


0.04 


4.79 




13,2 


0,01 


0.02 


0,48 




59.51 


0.04 


0.02 


5.77 


fcr-vs-kp 


12389.37 


0,27 


0.22 


1.58 


timographic 


95.75 


0.01 


0.02 


0.87 


monks- 1 


174.67 


0,01 


0.02 


0.29 




8558,14 


0.01 


0.02 


0.3 


monks-3 


83.49 


0,01 


0.02 


0.31 




17,48 


0.01 


0.02 


1.62 




206,04 


0,02 


0.02 


16,46 




129,28 


0.03 


0.02 


31.45 




89.05 


0,02 


0.02 


0,39 




5396.3 


0.03 


0.02 


0.48 


transfusion 


28.98 


0,01 


0.02 


4.34 


wdbc 


27.39 


0.04 


0.03 


20.86 


adult 


15655.92 


0.42 


2.08 


1771.73 


magic04 


7601,82 


0.28 


0.57 


1103.U5 




91856.38 


15.01 


3.7 


711.15 



Table, [16] gives the cost time for CH-MOGP C4,5, NB and PRIE to construct 
classifiers to maximize ROCCH, The experiment environment is an 8 core CPU with 
2,13GHz and 24GB RAM, Obviously, CH-MOGP consumes much more time than oth- 
ers, because of the metaheuristic character of EAs, GP needs to evaluate many classiers 
until it converges. On the other hand, NB method calculates an a posteriori probability 
and the C4,5 adopts uses a greedy method to increase information gain, PRIE employs 
a greedy strategy to construct classiers (more than one, usually dozens of classiers) to 
maximize the ROCCH, so it cost a little more time than NB and C4,5, but still much 
less than CH-MOGP, Actually, how to reduce the evaluation time of CH-MOGP is an 
important topic, 

5. Conclusions and Future Work 

In this paper, we propose convex hull-based sorting approach and area-based se- 
lection scheme involved into multi-objective genetic programming for maximizing the 
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ROC performance in classification tasks. First, we emphasized that convex hull max- 
imization problems is similar but beyond multi-objective optimization problem, tradi- 
tional techniques are helpful but needed to improve the solve the this kinds of prob- 
lem. Insteading of fast non-dominated sorting approach in NSGA-11 and SMS-EMOA, 
convex hull-based sorting is investigated in new algorithm design, however, we found 
convex hull-based sorting without redundancy was efficient to avoid losing diversity 
in the search process. Area-based selection scheme with /i + /i is also designed for 
helping to rank the population. The new algorithm- CH-MOGP is also performed 
on benchmarks and work better than other traditional EMOAs and some other tradi- 
tional machine learning algorithms. In the future work, there are three topics would 
be discussed. The first is how to improve CH-MOGP to reduce its time consuming 
character but keep the comparable performance for ROCCH maximization. The sec- 
ond one is that GP-based classifier could be replaced by other tree-based classifiers or 
other traditional machine learning classifiers such as SVM, NB, etc.. Different clas- 
sifier would result better performance for ROCCH maximization. The third topic is 
convex hull based without redundancy sorting and area-based selection scheme, these 
two strategies are not only used in classification but also other area such as numerical 
optimization. 
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